Microsoft has released Phi-3.5-vision, a lightweight, multimodal open source AI model designed for processing textual and visual inputs, supporting a context length of 128K. This model is suitable for resource-constrained environments and features capabilities such as image understanding, OCR, chart parsing, and multi-image summarization, showcasing excellent performance and low latency. Comprised of 4.2 billion parameters, it is trained with high-quality data to ensure performance and privacy. It includes three models: lightweight AI, expert mix, and multimodal model, all demonstrating outstanding performance in image and video processing benchmarks.